Arabic Mention Detection: Toward Better Unit of Analysis

نویسندگان

  • Yassine Benajiba
  • Imed Zitouni
چکیده

We investigate in this paper the adequate unit of analysis for Arabic Mention Detection. We experiment different segmentation schemes with various feature-sets. Results show that when limited resources are available, models built on morphologically segmented data outperform other models by up to 4F points. On the other hand, when more resources extracted from morphologically segmented data become available, models built with Arabic TreeBank style segmentation yield to better results. We also show additional improvement by combining different segmentation schemes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Word Segmentation for Better Unit of Analysis

The Arabic language has a very rich morphology where a word is composed of zero or more prefixes, a stem and zero or more suffixes. This makes Arabic data sparse compared to other languages, such as English, and consequently word segmentation becomes very important for many Natural Language Processing tasks that deal with the Arabic language. We present in this paper two segmentation schemes th...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

The Impact of Morphological Stemming on Arabic Mention Detection and Coreference Resolution

Arabic presents an interesting challenge to natural language processing, being a highly inflected and agglutinative language. In particular, this paper presents an in-depth investigation of the entity detection and recognition (EDR) task for Arabic. We start by highlighting why segmentation is a necessary prerequisite for EDR, continue by presenting a finite-state statistical segmenter, and the...

متن کامل

Evaluating the efficiency of bank branches with random data

Data Envelopment Analysis (DEA) is a mathematic technique to evaluate the relative efficiency of a group of homogeneous decision making units (DMUs) with multiple inputs and outputs. The efficiency of each unit is measured based on its distance to the production possibility set (PPS). In this paper, the BCC model is used in output-oriented. The average return on profit as output and the covaria...

متن کامل

Multilingual Mention Detection for Coreference Resolution

This paper proposes a novel algorithm for multilingual mention detection: we extract mentions from parse trees via kernelbased SVM learning. Our approach allows for straightforward mention detection for any language where (not necessary perfect) parsing resources are available, without any complex language-specific rule engineering. We also investigate possibilities for incorporating automatica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010